指定的实体识别任务是信息提取的核心任务之一。单词歧义和单词缩写是命名实体低识别率的重要原因。在本文中,我们提出了一种名为“实体识别模型WCL-BBCD”(与Bert-Bilstm-Crf-Dbpedia的单词对比学习),结合了对比度学习的概念。该模型首先在文本中训练句子对,计算句子对通过余弦的相似性中的单词对之间的相似性,以及通过相似性通过相似性来命名实体识别任务的BERT模型,以减轻单词歧义。然后,将微调的BERT模型与Bilstm-CRF模型相结合,以执行指定的实体识别任务。最后,将识别结果与先验知识(例如知识图)结合使用,以减轻单词缩写引起的低速问题的识别。实验结果表明,我们的模型在Conll-2003英语数据集和Ontonotes V5英语数据集上优于其他类似的模型方法。
translated by 谷歌翻译
近年来,可微弱的建筑搜索(飞镖)已经受到了大量的关注,主要是因为它通过重量分享和连续放松来显着降低计算成本。然而,更近期的作品发现现有的可分辨率NAS技术难以俯视幼稚基线,产生劣化架构作为搜索所需。本文通过将体系结构权重放入高斯分布,而不是直接优化架构参数,而不是直接优化架构参数,而是作为分布学习问题。通过利用自然梯度变分推理(NGVI),可以基于现有的码票来容易地优化架构分布而不会产生更多内存和计算消耗。我们展示了贝叶斯原则的可分解NAS如何益处,提高勘探和提高稳定性。 NAS-BENCH-201和NAS-BENCH-1SHOT1基准数据集的实验结果证实了所提出的框架可以制造的重要改进。此外,我们还在学习参数上只需简单地应用argmax,我们进一步利用了NAS中最近提出的无培训代理,从优化分布中汲取的组架构中选择最佳架构,从而实现最终的架构-ART在NAS-BENCH-201和NAS-BENCH-1SHOT1基准上的结果。我们在飞镖搜索空间中的最佳架构也会分别获得2.37 \%,15.72 \%和24.2 \%的竞争性测试错误,分别在Cifar-10,CiFar-100和Imagenet数据集上。
translated by 谷歌翻译
低光环境对强大的无人驾驶汽车(UAV)跟踪也构成了巨大的挑战,即使使用最新的(SOTA)跟踪器,由于潜在的图像特征在不利的光条件下很难提取。此外,由于可见性较低,人类监视器的准确在线选择也极为难以在地面控制站中初始化无人机跟踪。为了解决这些问题,这项工作提出了一个新颖的增强剂,即凸线网,以点燃人类操作员和无人机跟踪器的潜在对象。通过采用变压器,LightlightNet可以根据全局特征调整增强参数,因此可以适应照明变化。引入了像素级范围掩模,以使光明网络更加专注于没有光源的跟踪对象和区域的增强。此外,建立了一种软截断机制,以防止背景噪声被误认为关键特征。对图像增强基准测试的评估表明,光明网络在促进人类感知方面具有优势。公共Uavdark135基准进行的实验表明,HightlightNet比其他SOTA低光增强剂更适合无人机跟踪任务。此外,在典型的无人机平台上进行的现实世界测试验证了HightlightNet在夜间航空跟踪相关应用中的实用性和效率。代码和演示视频可在https://github.com/vision4robotics/highlightnet上找到。
translated by 谷歌翻译
在基于文本的分类器中测试公平性问题的一种常见方法是通过使用反事实来:如果更改输入中的敏感属性,则分类器输出是否会更改?现有的反事实生成方法通常依赖于单词列表或模板,产生不考虑语法,上下文或微妙敏感属性引用的简单反事实,并且可能会错过WordList创建者未考虑的问题。在本文中,我们介绍了一项为克服这些缺点而产生的反事实的任务,并证明了如何利用大型语言模型(LLM)来在此任务上取得进展。我们表明,这种基于LLM的方法可以产生现有方法无法实现的复杂反事实,从而比较了民事评论数据集中各种反事实生成方法的性能,并在评估毒性分类器时显示出它们的价值。
translated by 谷歌翻译
通过各种面部操作技术产生,由于安全问题,面部伪造检测引起了不断的关注。以前的作品总是根据交叉熵损失将面部伪造检测作为分类问题,这强调了类别级别差异,而不是真实和假面之间的基本差异,限制了看不见的域中的模型概括。为了解决这个问题,我们提出了一种新颖的面部伪造检测框架,名为双重对比学习(DCL),其特殊地构建了正负配对数据,并在不同粒度下进行了设计的对比学习,以学习广义特征表示。具体地,结合硬样品选择策略,首先提出通过特别构造实例对来促进与之相关的鉴别特征学习的任务相关的对比学习策略。此外,为了进一步探索基本的差异,引入内部内部对比学习(INL-ICL),以通过构建内部实例构建局部区域对来关注伪造的面中普遍存在的局部内容不一致。在若干数据集上的广泛实验和可视化证明了我们对最先进的竞争对手的方法的概括。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译